[TMA] Update TMAStoreWaitOp to wait for the memory write to complete by peterbell10 · Pull Request #10415 · triton-lang/triton

peterbell10 · 2026-05-29T17:24:06Z

Currently desc.store(...) does not guaruntee that that write is completed to global memory, so this makes message passing impossible.

e.g.

desc.store(...)
tl.atomic_xchg(flag, 1, sem="release")

does not release the store.

To maintain perf in gluon, I also expose the read-only tma wait variant so users can explicitly opt-in to the behavior.

Currently desc.store(...) does not guaruntee that that write is completed to global memory, so this makes message passing impossible. e.g. ``` desc.store(...) tl.atomic_xchg(flag, 1, sem="release") ``` does not release the store. To maintain perf in gluon, I also expose the read-only tma wait variant so users can explicitly opt-in to the behavior.

ThomasRaoux

LGTM

ThomasRaoux

actually thinking more about this if it is true it means something is broken in the acquire/release semantic of ptx ops. Are you sure the example you have doesn't work?

peterbell10 · 2026-05-29T20:10:24Z

https://triton-lang.slack.com/archives/C04CZ1MCL65/p1780084322002219

ThomasRaoux · 2026-05-29T20:15:01Z

https://triton-lang.slack.com/archives/C04CZ1MCL65/p1780084322002219

sad, anyway makes sense, hopefully it doesn't affect perf significantly

Use read-only waits for pipelined TMA stores so the in-loop wait only protects shared-memory staging buffer reuse. Skip TMA store pipelining when the loop contains acquire or release atomics so those memory-ordering cases keep the non-pipelined descriptor store lowering. This recovers some performance regression caused by [#10415](#10415)

peterbell10 requested a review from ThomasRaoux May 29, 2026 17:24

peterbell10 requested a review from ptillet as a code owner May 29, 2026 17:24

ThomasRaoux approved these changes May 29, 2026

View reviewed changes

ThomasRaoux reviewed May 29, 2026

View reviewed changes

peterbell10 merged commit 02480ad into main May 29, 2026
10 checks passed

peterbell10 deleted the pb/tma-store-wait branch May 29, 2026 20:18

ThomasRaoux mentioned this pull request Jun 2, 2026

[BACKEND] Use read-only waits for pipelined TMA stores #10444

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TMA] Update TMAStoreWaitOp to wait for the memory write to complete#10415

[TMA] Update TMAStoreWaitOp to wait for the memory write to complete#10415
peterbell10 merged 1 commit into
mainfrom
pb/tma-store-wait

peterbell10 commented May 29, 2026

Uh oh!

ThomasRaoux left a comment

Uh oh!

ThomasRaoux left a comment

Uh oh!

peterbell10 commented May 29, 2026

Uh oh!

ThomasRaoux commented May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

peterbell10 commented May 29, 2026

Uh oh!

ThomasRaoux left a comment

Choose a reason for hiding this comment

Uh oh!

ThomasRaoux left a comment

Choose a reason for hiding this comment

Uh oh!

peterbell10 commented May 29, 2026

Uh oh!

ThomasRaoux commented May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants